

圈圈汇编 计量经济圈 2019-06-30


总体最小二乘法Total least squares



 total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

德明回归Deming Regression


2、Passing-Bablok 回归适用于


Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.


Deming regression is equivalent to the maximum likelihood estimation of an errors-in-variables model in which the errors for the two variables are assumed to be independent and normally distributed, and the ratio of their variances, denoted δ, is known In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.


Deming regression: http://pan.baidu.com/s/1bptPygf

最小角回归Least Angle Regression

Efron于2004年提出的一种变量选择的方法,类似于向前逐步回归(Forward Stepwise)的形式。从解的过程上来看它是lasso regression的一种高效解法。


The advantages of the LARS method are:

1. It is computationally just as fast as forward selection.

2. It produces a full piecewise linear solution path, which is useful in cross-validation or similar attempts to tune the model.

3. If two variables are almost equally correlated with the response, then their coefficients should increase at approximately the same rate. The algorithm thus behaves as intuition would expect, and also is more stable.

4. It is easily modified to produce solutions for other estimators, like the lasso.

5. It is effective in contexts where p> n (i.e., when the number of dimensions is significantly greater than the number of points).

The disadvantages of the LARS method include:

1. With any amount of noise in the dependent variable and with high dimensional multicollinear independent variables, there is no reason to believe that the selected variables will have a high probability of being the actual underlying causal variables. This problem is not unique to LARS, as it is a general problem with variable selection approaches that seek to find underlying deterministic components. Yet, because LARS is based upon an iterative refitting of the residuals, it would appear to be especially sensitive to the effects of noise. This problem is discussed in detail by Weisberg in the discussion section of the Efron et al. (2004) Annals of Statistics article. Weisberg provides an empirical example based upon re-analysis of data originally used to validate LARS that the variable selection appears to have problems with highly correlated variables.

2. Since almost allhigh dimensional data in the real world will just by chance exhibit some fair degree of collinearity across at least some variables, the problem that LARS has with correlated variables may limit its application to high dimensional data.



    1)特别适合于特征维度n 远高于样本数m的情况。







    另外,本文对最小角回归法怎么求具体的θθ 参数值没有提及,仅仅涉及了原理,如果对具体的算计推导有兴趣,可以参考Bradley Efron的论文《Least Angle Regression》,网上很容易找到。  

保序回归(Isotonic Regression)



保序回归(isotonic regression)属于回归算法,对一个有限的实数集合Y表示观测响应,X集合表示未知的响应值,进行拟合找到一个最小化函数:




1. 如果预测输入能准确匹配训练特征,那么返回相关预测,如果有多个预测匹配训练特征,那么就返回其中之一。

2. 如果预测输入比所有的训练特征低或者高,那么最低和最高的训练特征各自返回。如果有多个预测比所有的训练特征低或者高,那么都会返回。

3. 如果预测输入介于两个训练特征,那么预测会被视为分段线性函数和从最接近的训练特征中计算得到的插值。

稳健回归(robust regression)

统计学稳健估计中的一种方法,其主要思路是将对异常值十分敏感的经典最小二乘回归中的目标函数进行修改。经典最小二乘回归以使误差平方和达到最小为其目标函数。因为方差为一不稳健统计量,故最小二乘回归是一种不稳健的方法。不同的目标函数定义了不同的稳健回归方法。常见的稳健回归方法有:最小中位平方(least median square;LMS)法、M估计法等。




Locally weighted linear regression


局部加权线性回归其实是一个非参数学习算法(non-parametric learning algorithm),而相对的的,线性回归则是一个参数学习算法(parametric learning algorithm),因为它的参数是固定不变的,而局部加权线性回归的参数是随着预测点的不同而不同


